makes computational communication science (ir)reproducible?
Universiteit van Amsterdam, Universität Hamburg, GESIS
2023-10-05
the endeavor to understand human communication by developing and applying digital tools that often involve a high degree of automation in observational, theoretical, and experimental research.
Adamic & Glance (2005)
Whatever piece of scholarship finds room in communication associations, journals, departments, and schools is communication research.
Waisbord (2019) - Communication: A Post-Discipline
CCR demands transparent and reproducible research. … As digital data and analysis code can be shared easily, computational research can be at the forefront of the open science philosophy … Most articles in CCR should be accompanied by an online appendix in a form that encourages reproducibility and reusability.
Really?
A result is reproducible when the same analysis (1) steps performed on the same dataset (2) consistently produce the same answer (3). This reproducibility of the result can be checked (4) by the original investigators and other researchers within a local computational environment.
CCR papers are computationally reproducible.
? / 30
rver + an optional Python pyenv layerrenv (R) and pipreqs (Python)docker-compose.yml
version: "3.8"
services:
repro_4-2:
build: .
network_mode: "host"
environment:
- SNAPSHOT_DATE=2023-05-25
- PYTHON_VERSION=3.11.3
volumes:
- ./output:/usr/local/src/output # [local path]:[container path]install_dependencies.R
snap_date <- as.Date(Sys.getenv("SNAPSHOT_DATE"))
repo <- paste0("https://packagemanager.posit.co/cran/", snap_date)
options(repos = c(REPO_NAME = repo))install_dependencies.sh
sed or patchbash script inside the containergit init
git remote add origin https://github.com/xxx/yyy
git fetch
git checkout ed5bd12a590b06e9a32058fe2eec57f38cd3f1e2
Rscript install_dependencies.r
unzip Docs.zip
mkdir Data ## undocumented
## Code execution
Rscript 01_data_processing.R
bash 02_glove.sh
Rscript 03_dictionary_generation.R
##04 hardcoded the plan; and multisession is dep
sed -i 's/multiprocess/multicore/' 04_sentiment_validation.R
Rscript 04_sentiment_validation.R
Rscript 05_sentiment_scoring.R
cp -r Data /usr/local/src/outputdeviations in the decimals or obvious typographical errors
Criteria 1 and 2
14 / 30
A B C D E F G H I J K L M N
Articles with code and data
B E G H J M N
Articles which the provided code can be executed
B E H J M N
Articles which produce the same answer
6 / 30
B
E
H
J
M
N
B (Major code rewrite, missing data for an analysis in the appendix)
E (Minor code rewrite)
H (Major code rewrite)
J (Major code rewrite)
M (No code rewrite, run for a month)
N (Minor code rewrite, missing data for an analysis in the appendix)
B (Major code rewrite, missing data for an analysis in the appendix)
E (Minor code rewrite)
H (Major code rewrite)
J (Major code rewrite)
M (No code rewrite, run for a month)
N (Minor code rewrite, missing data for an analysis in the appendix)
00_preprocess.R, 01_validation.RMakefile, dodoR CMD BATCH, jupyter nbconvert --executeWIP, don’t cite